1st FMTC-Br, August, 2018.

Machine Learning the Greeks

  1. Problem: How to obtain the greeks (without going to Greece)
  2. Methodology

    2.1 Introducing the Greeks

    2.2 A tour on Gaussian Processes

  3. Results

    3.1 Synthetic data

    3.2 European options S&P500

  4. Discussion

The Problem: Differentiating Data

Portfolios and Risk

  • Holding derivatives exposes you to market Risk

  • One can hedge exposure using appropriate techniques

  • The most basic techniques uses derivative sensitivities (Greeks)

  • But how can one compute these?

The Greeks

  • Collection of quantities used to measure sensitivity!

    • W.r.t. the underlying asset
  • Delta (\(\Delta\))

    • Rate of change: \(\frac{\partial P}{\partial S_0}\)
    • % of stocks needed to perform hedge

The Greeks

  • Theta (\(\Theta\))
    • Rate of change (w.r.t. time): \(\frac{\partial P}{\partial \tau}\)
    • Decreasing in time (no surprise in here)
  • Gamma (\(\Gamma\))
    • Derivative of \(\Delta\): \(\frac{\partial^2 P}{\partial^2 S_0}\)
    • Refers to the convexity w.r.t. underlying asset
    • \(\Uparrow \Gamma \Rightarrow \Uparrow\) assessment of portfolio position

How to obtain the Greeks?

  • Set up a market model:

  • Closed-form formulae
    • For simple classical models only :(
  • More sophisticated models?
    • Require cumbersome calibration procedure
    • Require some method of approximating derivatives
    • Very computationally intensive
  • What if we didn´t use a market model at all?

Modelling the Data

Let´s consider a stochastic model for functional data.

\[ \begin{eqnarray}\tag{01} y = f(\mathbf{X}) + \varepsilon, \quad \varepsilon \sim \mathcal{N}(\mathbf{0}, \mathbf{\sigma^2}) \end{eqnarray} \]

  • Assumptions about \(f(\cdot)\) can lead to 2 types of models (Rasmussen 2004)

    • Option 1: Fixe a class of functions
      • Loss of flexibility
    • Option 2: Assigns prior probabilities to a space of funtions
      • Very flexible

      • Not possible to test all possible sets of functions

Gaussian Processes (GP): a helicopter tour

  • GP generalizes the gaussian distribution
    • A stochastic model of functional data
  • Def: A GP is a collection of RV \(f(\mathbf{x})_{x \in \mathbf{R}^D}\) such that for every collection of \(x^{(1)}, \ldots, x^{(N)}\), \(f(x^{(i)})_{i=1}^N\) is a gaussian vector.

  • Since everything is Gaussian, taking a "Bayesian" approach is easy
  • Closed-form posterior distributions
  • Plenty of useful properties

Gaussian Processes: a helicopter tour (cont.)

  • A GP is completely specified by its mean and covariance functions

  • \[ \begin{eqnarray}\tag{02} m(\mathbf{x}) &= \mathbb{E}[f(\mathbf{x})] \end{eqnarray} \]

  • \[ \begin{eqnarray}\tag{03} k(\mathbf{x},\mathbf{x'}) &= \text{cov} (f(\mathbf{x}),f(\mathbf{x'})) \end{eqnarray} \]

  • Notation: \(f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}),\ k(\mathbf{x},\mathbf{x'}))\)

  • \(k\) defines the correlation structure between the inputs!

Gaussian Processes: a helicopter tour (cont.)

  • Given the training data, we can use Bayes theorem to obtain a predictive posterior
    • It will another gaussian process!
    • \(f(\mathbf{x})|\mathbf{y},\mathbf{X} \sim \mathcal{GP}(\tilde{m}(\mathbf{x}), \tilde{k}(\mathbf{x}, \mathbf{x'}))\)
    • The new mean and covariance functions combine info from the prior and the data.
      • \(\tilde{m}(\mathbf{x})\) is the model prediction using \((\mathbf{y}, \mathbf{X})\)
      • \(\tilde{k}(\mathbf{x}, \mathbf{x'})\) is the accuracy of the prediction (and a proxy for the MSE of \(\tilde{m}\) at \(x\))

Gaussian Processes: a helicopter tour (cont.)

Toy example: European Put Prices (Black-scholes model)

Specification of \(m(\mathbf{x})\), \(\ k(\mathbf{x},\mathbf{x'})\)

The conditional distribution \(f(\mathbf{x_*})|\mathbf{X}, \mathbf{y}\) is given by

  • \[ \begin{eqnarray} f(\mathbf{x}) | \mathbf{X}, \mathbf{y} &\sim \mathcal{GP}(\tilde{m}(\mathbf{x}), \tilde{k}(\mathbf{x}, \mathbf{x'})), \quad \text{where} \nonumber \end{eqnarray} \]

  • \[ \begin{eqnarray} \tilde{m}(\mathbf{x}) & = k(\mathbf{x_*}, \mathbf{X})[k(\mathbf{X},\mathbf{X})+\Sigma]^{-1}\mathbf{y}\quad \text{and} \nonumber \end{eqnarray} \]

  • \[ \begin{eqnarray} \tilde{k}(\mathbf{x},\mathbf{x'}) &= \tilde{k}(\mathbf{x},\mathbf{x'}) - k(\mathbf{x},\mathbf{X})^T[k(\mathbf{X},\mathbf{X})+ \Sigma]^{-1}k(\mathbf{x'},\mathbf{X}). \end{eqnarray} \]

Covariance Function (the kernel)

  • Defines how two separated points are related to each other

  • If \(k\) is known and \(m(x) = 0\), the problem is reduced to apply the formulas from the previous slide

    • In practice it needs to be inferred (by fitting its hyperparameters)
  • The choice of the kernel will have impact on the estimates

Covariance Function (the kernel)

  • In our examples \(k\) could be

  • \[ \begin{eqnarray} k_{G}(|x-y|) &\equiv \exp \left(-\frac{|x-y|^2}{2\theta^2} \right) &\text{(Gaussian)}\nonumber \end{eqnarray} \]

  • \[ \begin{eqnarray} k_{M3}(|x-y|) & \equiv \left(1 + \frac{\sqrt{3}|x-y|}{\theta} \right)\exp \left\{-\frac{\sqrt{3}|x-y|}{\theta} \right\} &\text{(Matérn$_{3/2}$)} \nonumber \end{eqnarray} \]

  • \[ \begin{eqnarray} k_{M2}(|x-y|) & \equiv \left(1 + \frac{\sqrt{5}|x-y|}{\theta} + \frac{5 |x-y|^2}{3\theta^2} \right)\exp \left\{-\frac{\sqrt{5}|x-y|}{\theta} \right\} &\text{(Matérn$_{5/2}$)} \nonumber \end{eqnarray} \]

Covariance Function (the kernel) (cont.)

Mean function

  • Adds the complexity on the model
  • Could be a constant, linear, polynomial, etc
    • More complexity \(\Rightarrow\) overfitting
  • Constant mean functions shrinks estimates out-of-sample towards zero
  • Ordinary kriging \(\times\) Universal kriging

The length-scale

  • Drives the smoothness of the process for a given kernel

    • Bigger \(\theta\) \(\Rightarrow\) flatter curves
    • In the predictive mean, it weights the prior and data information

Some comments on differentiability

Proposition 1.

Given \(f(\mathbf{x})\sim \mathcal{GP}(m(\mathbf{x}),k(\mathbf{x},\mathbf{x'}))\) such that \(\frac{\partial^2 k}{\partial x_i\partial x'_i}(\mathbf{x},\mathbf{x'})\) and \(\frac{\partial m}{\partial x_i}(\mathbf{x})\) exists, then

  • \[ \begin{eqnarray} \frac{\partial f}{\partial x_i}(\mathbf{x})\sim \mathcal{GP}\left(\frac{\partial m}{\partial x_i}(\mathbf{x}),\frac{\partial^2 k}{\partial x'_i\partial x_i}(\mathbf{x},\mathbf{x'})\right)\nonumber \end{eqnarray} \]

  • Distribution of the derivative of a GP

  • Given \(f(\mathbf{x})\sim \mathcal{GP}(m(\mathbf{x}),k(\mathbf{x},\mathbf{x'}))\) such that \(\frac{\partial^2k}{\partial x_i\partial x'_i}(\mathbf{x},\mathbf{x'})\) and \(\frac{\partial m}{\partial x_i}(\mathbf{x'})\) exists, then

  • \[ \begin{eqnarray} \frac{\partial f}{\partial x_i}(\mathbf{x})|\mathbf{X},f(\mathbf{X}) \sim \mathcal{GP}&\big(\frac{\partial k}{\partial x_i}(\mathbf{x},\mathbf{X})k(\mathbf{X},\mathbf{X})^{-1}f(\mathbf{X}),\frac{\partial^2 k}{\partial x_i\partial y_i}(\mathbf{x},\mathbf{x'}) \nonumber\\ &- \frac{\partial k}{\partial x_i}(\mathbf{x},\mathbf{X})k(\mathbf{X},\mathbf{X})^{-1}\frac{\partial k}{\partial y_i}(\mathbf{X},\mathbf{x'})\big)\label{eq-2.13} \end{eqnarray} \]

Getting some Deltas (1)

Getting some Deltas (2)

What are our constraints?

  • Positivity: Prices are positive
  • Monotonicity: \(\Delta\) is decreasing
  • Convexity: Non-arbitrage constraint

Results: synthetic

  • Data generated from a BS model

    • We calculated the 'true' Deltas

    • Training data x test data

    • Different maturities, strikes, kernels, length-scales, test points in and out-of-sample

    • Evaluated in terms of rMSE, Bias and Coverage

    • Checked violations in positivity, monotonicity and convexity (pathwise)

Results: synthetic - one dimension

Results: synthetic - one dimension (2)

Results: synthetic - one dimension (2)

Results: synthetic - one dimension (3)

Results: synthetic - two dimension (1)

Results: synthetic - two dimension (2)

Results: synthetic - two dimension (3)

Results: synthetic - two dimension (4)

Results: Real data

  • Let's see how GP performs on real data

  • Data description

    • End of the day options data on S&P500
    • Data from the entire month of October, 2015
    • Bid and ask of each option
    • Maturity ranging from 1 week to 1 year and half

Coming back to the problem: calculating derivatives

Results: Real data

Results: Real data

Wrapping-up

  • Gaussian Processes are a flexible data-driven method
  • Can be used to learn from data and gives us the Greeks
  • Provides sensible, model-free Greek estimates

References

Rasmussen, Carl Edward. 2004. “Gaussian Processes in Machine Learning.” In Advanced Lectures on Machine Learning, 63–71. Springer.